Comment on The origin of bursts and heavy tails in human dynamics 



In a recent letter, Barabasi claims that the dynamics of a number of human activities are 
scale-free . He specifically reports that the probability distribution of time intervals r between 
consecutive e-mails sent by a single user and time delays for e-mail replies follow a power-law, 
P(t) r~" with a ~ 1, and proposes a priority-queuing process as an explanation of the "bursty" 
nature of human activity. Here, we quantitatively demonstrate that the reported power-law distri- 
butions are solely an artifact of the analysis of the empirical data and that the proposed model is 
not representative of e-mail communication patterns. 

Barabasi analyzed the email communication patterns of a subset of users I2I] in a database 
containing the email usage records of 3188 individuals using a university e-mail server over an 
83-day period |i3]. Upon examining the same data, we find a number of significant deficiencies in 
his analysis. These deficiencies were communicated to Barabasi well in advance of publication lO]. 
For example, even though the data have a resolution of one second, the statistical analysis reported 
in Fig. 2 of Ref. li indicates that the most frequent time interval between consecutive e-mails 
sent by the same user occurs for time intervals smaller than one second. Even more surprisingly, 
the user considered in Fig. 2 of Ref. 1 1] appears to respond to e-mails most frequently for times 
smaller than five seconds. We verified that such time intervals are too short to permit a person to 
write and send consecutive e-mails, much less read, write, and reply to an e-mail. 

n 

Unfortunately, these are not the only problems with the claims of Ref. flj. Barabasi claims that 
the time series of the typical user is well-described by a power-law distribution with an exponent 
a ~ 1. This claim is revised in more recent work, which suggests that the power-law is modified 
by an exponential truncation jslQ]. Our own analysis of the same empirical data used in Ref. Ill] 
suggests that a log-normal distribution provides a significantly better description of the data. 

Our hypothesis of a log-normal distribution may also be more appropriate for describing the 
activity of users that rely on e-mail for daily communication for the following reasons. To our 
knowledge, there are no studies reporting that a real-world process is well-described by a power- 
law with an exponent a ~ 1. An apparent scaling exponent a c:^ \ and concave curvature depicted 
in Fig. 2A of Ref. Ill] and numerous figures of Ref. |ct] are, however, characteristics of a log-normal 
distribution, which are representative of many real-world processes |3]- Log-normal distributions 
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are easily identifiable by examining the probability density of s = ln(r). Under this transfor- 
mation, a power-law with exponent a 1 would be a uniform distribution of s. It is visually 
apparent that P{s) is not uniform, but rather Gaussian in form (see Supplementary Information). 
The Gaussian form of this distribution suggests that users send consecutive e-mails with a charac- 
teristic time (r ^ 45 minutes for the user in Fig.[T]) as opposed to Barabasi's contention that users 
send e-mails without a characteristic scale. 

We conduct a Bayesian model selection analysis to decide between the two proposed descrip- 
tions of the data jslQ]. As in Ref. Ill, we analyze both the time intervals between consecutive 
e-mails sent and the time required to reply to an e-mail. To be as considerate as possible with 
the analysis of Ref. jlj], we assume prior probabilities of 90% for the truncated power-law model 
and 10% for the log-normal model. A more stringent comparison would give each model equal 
likelihood of describing the data in the absence of additional information. Furthermore, we restrict 
the time domain for our analysis of the power-law whereas we consider the entire time domain for 
our analysis of the log-normal distribution. We find that the posterior probability of the log-normal 
description being correct is indistinguishable from one within the computer's numerical precision. 

Additionally, we calculate posterior probabilities as a function of the magnitude of the power- 
law domain. As we show in Fig.[Tl the log-normal distribution provides a better description of the 
data than a power-law except when less than one order of magnitude is considered for the analysis 
of the power-law (see Fig.[T]and the Supplementary Information for full details of this analysis). 

We next discuss the priority-queuing model which reportedly explains the mechanism behind 
the reply times in e-mail communication. Before addressing the details of the model, however, 
we would like to emphasize that the model predicts a power-law for the distribution of response 
time delays, not the empirically observed log-normal distribution. These predictions of the model 
are supported by recent analytical work by Vazquez et al. jsl 0] . As we demonstrate above, that 
prediction is not supported by the data. 

The priority-queuing model is not only unrealistic in its prediction of the functional form of the 
distribution of time delays for e-mail responses. After an initial transient period, new tasks in the 
model are typically executed immediately after arrival, resulting in a pronounced peak at r = 1; 
these tasks are said to represent e-mails which are either immediately replied to or deleted 
In the case that reportedly best captures human dynamics iQ], that is, when e ^ 0, where e is 
the probability of executing a randomly- selected task instead of the highest-priority task, this peak 
contains 99.9% of the tasks handled by the user — an unrealistic scenario. Moreover, upon reaching 
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steady state, the distribution of task priorities on the queue converges to a uniform distribution in 
the interval [0, e]. The model thus predicts that the typical e-mail user has a queue filled with 
extremely low priority tasks and consequently performs all new incoming tasks immediately upon 
arrival. This situation is also not representative of typical human behavior. 

E-mail communication patterns are a valuable proxy for the study of human behavior and 
decision-making. The idea of humans relying solely on a priority-queuing procedure Iil0| to man- 
age their complex activity is interesting. Unfortunately, we find that even though Ref. [1] is quite 
stimulating, none of the results it reports hold upon further inspection. 
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ln(T) Orders of magnitude considered 
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FIG. 1: Bayesian model selection protocol for comparing the power-law with exponent a = 1 and the 
log-normal models, a, Probability density of the time intervals r between consecutive e-mails sent for a 
user who sent approximately 8 e-mails per day during the study period |3]. The region shaded in grey 
T € [5, 50000] corresponds to the range reported in 1 1, 5, 6] to be well-approximated by a power-law with 
exponent a = 1. b, To investigate the validity of the log-normal hypothesis, we take advantage of the fact 
that the probability function P{s) of the logarithm of the time interval s = ln(r) should be Gaussian. It is 
visually apparent that the probability function for the user considered in a (black curve) is well-described by 
a Gaussian (dashed red curve), c, To investigate the validity of the power-law with an exponent a = 1, we 
can perform a similar transformation and analysis. If the data found in the grey shaded region of a were well- 
described by a power-law with exponent a = 1, the P{s) would be linear with slope {smax — Smm)~^- It is 
visually apparent that the probability function for the data in the range ln(5) < s < ln(50000) is not linear, 
d, Even when the considered data range is reduced one order of magnitude to ln(50) < s < ln(50000) the 
data are still not linear, e, We conduct Bayesian model selection analysis between the two candidate models 
for all 1202 users who sent more than 10 messages during the study period |3]. When considering the time 
interval between consecutive emails, we find that one would effectively always select the log-normal model 
over the power-law model, except when the power-law model is used for one order of magnitude or less 
in the time domain, f. We perform a similar analysis for all 760 users who reply to at least 10 messages 
during the study period |3]. When considering the time delay for e-mail replies, we find that one would 
always select the log-normal model over the power-law model, even when considering just one order of 
magnitude of the data. It is critical to note that in the analysis of both e and f we consider all user data for 
the log-normal model in contrast to the reduced range considered for the power-law model. 
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